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Abstract 


Often semiparametric estimators are asymptotically equivalent to a sample average. The 
object being averaged is referred to as the influence function. The influence function is use¬ 
ful in formulating primitive regularity conditions for asymptotic normality, in efficiency 
comparions, for bias reduction, and for analyzing robustness. We show that the influence 
function of a semiparametric estimator can be calculated as the limit of the Gateaux deriva¬ 
tive of a parameter with respect to a smooth deviation as the deviation approaches a point 
mass. We also consider high level and primitive regularity conditions for validity of the 
influence function calculation. The conditions involve Frechet differentiability, nonparamet- 
ric convergence rates, stochastic equicontinuity, and small bias conditions. We apply these 
results to examples. 
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1 Introduction 


Often semiparametric estimators are asymptotically equivalent to a sample average. The object 
being averaged is referred to as the influence function. The influence function is useful for a 
number of purposes. Its variance is the asymptotic variance of the estimator and so it can be 
used for asymptotic efficiency comparisons. Also, the form of remainder terms follow from the 
form of the influence function so knowing the influence function should be a good starting point 
for finding regularity conditions. In addition, estimators of the influence function can be used to 
reduce bias of a semiparametric estimator. Furthermore, the influence function approximately 
gives the influence of a single observation on the estimator. Indeed this interpretation is where 
the influence function gets its name in the robust estimation literature, see Hampel (1968, 
1974). 

We show how the influence function of a semiparametric estimator can be calculated from 
the functional given by the limit of the semiparametric estimator. We show that the influence 
function is the limit of the Gateaux derivative of the functional with respect to a smooth 
deviation from the true distribution, as the deviation approaches a point mass. This calculation 
is similar to that of Hampel (1968, 1974), except that the deviation from the true distribution 
is restricted to be smooth. Smoothness of the deviation is necessary when the domain of the 
functional is restricted to smooth functions. As the deviation approaches a point mass the 
derivative with respect to it approaches the influence function. This calculation applies to 
many semiparametric estimators that are not defined for point mass deviations, such as those 
that depend on nonparametric estimators of densities and conditional expectations. 

We also consider regularity conditions for validity of the influence function calculation. The 
conditions involve Frechet differentiability as well as convergence rates for nonparametric esti¬ 
mators. They also involve stochastic equicontinuity and small bias conditions. When estimators 
depend on nonparametric objects like conditional expectations and pdf’s, the Frechet differen¬ 
tiability condition is generally satisfied for intuitive norms, e.g. as is well known from Goldstein 
and Messer (1992). The situation is different for functionals of the empirical distribution where 
Frechet differentiability is only known to hold under special norms, Dudley (1994). The asymp¬ 
totic theory here also differs from functionals of the empirical distribution in other ways as will 
be discussed below. 

Newey (1994) previously showed that the influence function of a semiparametric estimator 
can be obtained by solving a pathwise derivative equation. That approach has proven useful 
in many settings but does require solving a functional equation in some way. The approach 
of this paper corresponds to specifying a path so that the influence can be calculated directly 
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from the derivative. This approach eliminates the necessity of finding a solution to a functional 
equation. 

Regularity conditions for functionals of nonparametric estimators involving Frechet differen¬ 
tiability have previously been formulated by Ait-Sahalia (1991), Goldstein and Messer (1992), 
Newey and McFadden (1994), Newey (1994), Chen and Shen (1998), Chen, Linton, and Kei- 
legom (2003), and Ichimura and Lee (2010), among others. Newey (1994) gave stochastic 
equicontinuity and small bias conditions for functionals of series estimators. In this paper we 
update those using Belloni, Chernozhukov, Chetverikov, and Kato (2015). Bickel and Ritov 
(2003) formulated similar conditions for kernel estimators. Andrews (2004) gave stochastic 
equicontinuity conditions for the more general setting of GMM estimators that depend on 
nonparametric estimators. 

In Section 2 we describe the estimators we consider. Section 3 presents the method for 
calculating the influence function. In Section 4 we outline some conditions for validity of the 
influence function calculation. Section 5 gives primitive conditions for linear functionals of 
kernel density and series regression estimators. Section 6 outlines additional conditions for 
semiparametric GMM estimators. Section 7 concludes. 

2 Semiparametric Estimators 

The subject of this paper is estimators of parameters that depend on unknown functions such 
as probability densities or conditional expectations. We consider estimators of these parameters 
based on nonparametric estimates of the unknown functions. We refer to these estimators as 
semiparametric, with the understanding that they depend on nonparametric estimators. We 
could also refer to them as “plug in estimators” or more precisely as “plug in estimators that 
have an influence function.” This terminology seems awkward though, so we simply refer to 
them as semiparametric estimators. We denote such an estimator by which is a function 
of the data z\. ...,z n where n is the number of observations. Throughout the paper we will 
assume that the data observations Zi are i.i.d. We denote the object that /3 estimates as /3 0 , 
the subscript referring to the parameter value under the distribution that generated the data. 

Some examples can help fix ideas. One example with a long history is the integrated squared 
density where /3 0 = / fo{z) 2 dz, Zi has pdf fo(z), and z is r-dimensional. This object is useful 
in certain testing settings. A variety of different estimators of /3 0 have been suggested. One 
estimator is based on a kernel estimator f(z) of the density given by 

-. 77 - n 

f^ = ^E K d-ZZi),j K(u)du=l, 
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where h is a bandwidth and K(u) is a kernel. An estimator p can then be constructed by 
plugging in / in place of /o in the formula for j3 Q as 

P = J fi z ) 2 dz. 


This estimator of /? 0 and other estimators have been previously considered by many others. We 
use it as an example to help illustrate the results of this paper. 

It is known that there are other estimators that are better than ft. One of these is 


i n 

P = -^f-i(Zi)J-i(z) 


2=1 


(n — 1 )h 7 


■X>( 

i¥=i 


Z — z-i 


), 


where K(u) is a symmetric kernel. Gine and Nickl (2008) showed that this estimator converges 
at optimal rates while it is well known that /3 does not. Our purpose in considering j3 is not to 
suggest it as the best estimator but instead to use it to illustrate the results of this paper. 

Another example is based on the bound on average consumer surplus given in Hausman 
and Newey (2015). Here a data observation is z = ( q,p,y) where q is quantity of some good, p 
is price, and y is income. For x = (p, y ) the object of interest is 

Po = J W(x)d 0 (x)dx, W(x) = w(y)l(p° <p< p 1 )e -6(p-p °), d Q (x) = E[q\x]. 

From Hausman and Newey (2015) it follows that this object is a bound on the weighted average 
over income and individuals of average equivalent variation for a price change from p° to p 1 
when there is general heterogeneity. It is an upper (or lower) bound for average surplus when 
b is a lower (or upper) bound for individual income effects. Here w(y) > 0 is a known weight 
function that is used to average across income levels. 

One estimator of /3 0 can be obtained by plugging-in a series nonparametric regression estima¬ 
tor of do(x) in the formula for /3 0 . To describe a series estimator let p K (x ) = (pik(%), ■■■,Pkk(x)) t 
be a vector of approximating functions such as power series or regression splines. Also let 
P = \p K (x i), ...,p A ( x n )] T and Q = (qi, ...,q n ) T be the matrix and vector of observations on the 
approximating functions and on quantity. A series estimator of do(x) = E[q\x\ is given by 


d(x)=p K (x) T 7,7 = S 1 P T Q/n,T, = P T P/n, 


where P T P will be nonsingular with probability approaching one under conditions outlined 
below. We can then plug in this estimator to obtain 

t-jwvaw*. 

We use this estimator as a second example. 
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This paper is about estimators that have an influence function. We and others refer to these 
as asymptotically linear estimators. An asymptotically linear estimator is one satisfying 

n 

Vn0~ Po) = ^2^( z i)/Vn + o p (l),E[ijj(zi)\ = 0 , E['ip(z i ) T 'ip(z i )\ < oo. (2.1) 

i=1 

The function ip(z) is referred to as the influence function, following terminology of Hampel 
(1968,1974). It gives the influence of a single observation in the leading term of the expansion 
in equation (12.11) . It also quantifies the effect of a small change in the distribution on the limit 
of ft as we further explain below. 

In the integrated squared density example the influence function is well known to be 

i/j(z) = 2[f 0 (z) - ft 0 }. 

This formula holds for the estimators mentioned above and for all other asymptotically linear 
estimators of the integral of the square of an unrestricted pdf. In the consumer surplus example 
the influence function is 


4>( z ) = S(x)[q - d 0 (x)],5(x) = f 0 (x) l W(x). 


as will be shown below. 

3 Calculating the Influence Function 

In this Section we provide a method for calculating the influence function. The key object 
on which the influence function depends is the limit of the estimator when Zj has CDF F. We 
denote this object by /3(F). It describes how the limit of the estimator varies as the distribution 
of a data observation varies. Formally, it is mapping from a set F of CDF’s into the real line, 

ft(-) :F^/R. 

In the integrated squared density example /3(F) = j f(z) 2 dz, where all elements of the domain 
F are restricted to be continuous distributions with pdfs that are square integrable. In the av¬ 
erage surplus example /3(F) = f W(x)Ep[q\x\dx where the domain is restricted to distributions 
where Ep[q\x\ and /3(F) exist and x is continuously distributed with pdf fo(x) that is positive 
where W(x) is positive. 

We use how (3(F) varies with F to calculate the influence function. Let G ^ denote a CDF 
such that (1 — t)Fo + tG ^ is in the domain F of /3(F) for small enough t and G ^ approaches 
a point-mass at z as h —> 0. For example, if F is restricted to continuous distributions then 
we could take G*l to be continuous with pdf g%(z) = h~ r K((z — z)/h) for K(u) a bounded pdf 
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with bounded support and z denoting a possible value of z € Under regularity conditions 
given below the influence function can be calculated as 




lim 

h—>-0 




(3.2) 


The derivative in this expression is the Gateaux derivative of the functional /3(F) with respect 
to “contamination” G? to the true distribution Fq Thus this formula says that the influence 
function is the limit of the Gateaux derivative of /3(F) as the contamination distribution G(/ 
approaches a point mass at 

For example, consider the integrated squared density where we let the contamination dis¬ 
tribution G z have a pdf g~(z) = h~ r K((z — z)/h) for a bounded kernel K(u). Then 

= J 2[/o(2) - Po]9z(z)dz- 

Assuming that fo(z) is continuous at z, the limit as h —> 0 is given by 


lim 

h—>-0 


±/5((l~t)-F 0 + t-G h z )\ t=0 


= 2 lim 
h—>-0 


fo(z)g z (z)dz - 2(3 0 = 2[/q0) - /3 0 ]. 


This function is the influence function at 2 of semiparametric estimators of the integrated 
squared density. Thus equation (|3.2I) holds in the example of an integrated squared density. As 
we show below, equation (13.21) . including the Gateaux differentiability, holds for any asymptot¬ 
ically linear estimator satisfying certain mild regularity conditions. 

Equation (13.21) can be thought of as a generalization of the influence function calculation of 
Hampel (1968, 1974). That calculation is based on contamination 5 Z that puts probability one 
on Zi = z. If (1 — t) ■ Fq +1 ■ 5 Z is the domain F of /3(F) then the influence function is given by 
the Gateaux derivative 

i>(z) = ^/3((! - t) ■ F 0 + t ■ (5^)|t=o- 

The problem with this calculation is that (1 — t) ■ Fq +1 ■ 5 z will not be in the domain T for 
many semiparametric estimators. It is not defined for the integrated squared density, average 
consumer surplus, nor for any other /3(F) that is only well defined for continuous distributions. 
Equation (13.21) circumvents this problem by restricting the contamination to be in F. The 
influence function is then obtained as the limit of a Gateaux derivative as the contamination 
approaches a point mass, rather than the Gateaux derivative with respect to a point mass. This 
generalization applies to most semiparametric estimators. 

We can relate the influence function calculation here to the pathwise derivative character¬ 
ization of the influence function given in Van der Vaart (1991) and Newey (1994). Consider 
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(1 — t) ■ Fo + 1 ■ as a path with parameter t passing through the truth at t = 0. It turns out 
that this path is exactly the right one to get the influence function from the pathwise derivative. 
Suppose that Fo has pdf fo and G has density g\ so that the likelihood corresponding to this 
path is (1 — t) ■ fo + t ■ gh- The derivative of the corresponding log-likelihood at zero, i.e. the 
score, is S(zi) = g^{zi) / fo{zi) — 1, where we do not worry about finite second moment of the 
score for the moment. As shown by Van der Vaart (1991), the influence function will solve the 
equation 


±P((l-t).Fo + t-G% = o 


E[ip(zi)S(zi )] 



fo{z)dz 


^(z)gz{z)dz. 


Taking the limit as h —> 0 then gives the formula (|3.2D for the influence function when the 
influence function is continuous at z. In this way Ft = (1 — t) • Fo +1 ■ G% can be thought of as 
a path where the pathwise derivative converges to the influence function as g^{z) approaches a 
point mass at 2 . 

We give a theoretical justification for the formula in equation (|3.2p by assuming that an 
estimator is asymptotically linear and then showing that equation (13.211 is satisfied under a 
few mild regularity conditions. One of the regularity conditions we use is local regularity of 0 
along the path F t . This property is that for any t n = 0(l/y/n), when zi, ■■■, z n are i.i.d. with 
distribution F tn , 

MP-P(F tn )} A N(0,V),V = EiiPiziWzif ]. 


That is, under a sequence of local alternatives, when 0 is centered at 0(Ft), then (3 has the same 
limit in distribution as for F). This is a very mild regularity condition. Many semiparametric 
estimators could be shown to be uniformly asymptotically normal for t in a neighborhood of 
0, which would imply this condition. Furthermore, it turns out that asymptotic linearity of 
0 and Gateaux differentiability of 0{F t ) at t = 0 are sufficient for local regularity. For these 
reasons we view local regularity as a mild condition for the influence function calculation. 

For simplicity we give a result for cases where Fo is a continuous distribution with pdf fo 
and T includes paths (1 — t) ■ Fo + t ■ G% where G% has pdf g\(z) = h~ r K((z — z)/h ) and 
K(u ) is a bounded pdf with bounded support. We also show below how this calculation can be 
generalized to cases where the deviation need not be a continuous distribution. 


Theorem 1: Suppose that 0 is asymptotically linear with influence function 'tp(z) that is 
continuous at z and z j is continuously distributed with pdf fo(z) that is bounded away from 
zero on a neighborhood of z. If 0 is locally regular for the path (1 — f)F 0 + tG% then equation 
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IS.2j) is satisfied. Furthermore, if /5((1 — t)Fo + tG’f) is differentiable at t = 0 with derivative 
f fi(z)g l f(z)dz then f3 is locally regular. 

This result shows that if an estimator is asymptotically linear and certain conditions are 
satisfied then the influence function satisfies equation fl S. 2\i . justifying the calculation of the 
influence function. Furthermore, the process of that calculation will generally show differentia¬ 
bility of /3((1 — t)Fo + tG^) and so imply local regularity of the estimator, confirming one of the 
hypotheses that is used to justify the formula. In this way this result provides a precise link 
between the influence function of an estimator and the formula in equation (j,9.gj) . 

This result is like Van der Vaart (1991) in showing that an asymptotically linear estimator 
is regular if an only if its limit is pathwise differentiable. It differs in some of the regularity 
conditions and in restricting the paths to have the mixture form (1 — t)F$ + tG\ with kernel 
density contamination G\. Such a restriction on the paths actually weakens the local regularity 
hypothesis because ft only has to be locally regular for a particular kind of path rather than a 
general class of paths. 

Although Theorem 1 assumes z is continuously distributed the calculation of the influence 
function will work for combinations of discretely and continuously distributed variables. For 
such cases the calculation can proceed with a deviation that is a product of a point mass 
for the discrete variables and a kernel density for the continuous variables. More generally, 
only the variables that are restricted to be continuously distributed in the domain T need be 
continuously distributed in the deviation. 

We can illustrate using the consumer surplus example. Consider a deviation that is a 
product of a point mass 5 q at some q and a kernel density = h~ 2 K((x — x)/h) centered 

at x = ( p,y). The corresponding path is 

F t = (l-t)F 0 + t5 q G*, 

where G% is the distribution corresponding to g^(x). Let ft(x) = (1 — t)fo(x) + tg h (x ) be 
the marginal pdf for x along the path. Multiplying and dividing by ft(x) and using iterated 
expectations we find that 

P{Ft) = I W(x)E Ft [q\x\dx = J f t {x)~ l W {x)E Ft [q\x\f t {x)dx = E Ft [f t {x i )~ 1 W{x i )q i ]. 
Differentiating with respect to t gives 

dfi{Ft) _ q J s^)g h (x)dx - fi 0 

+ [(- l )fo{x)~ 2 [g h {x) - fo(x)]W(x)E[q\x]f 0 (x)dx 


dt 


t =o 


q J S(x)g h (x)dx — J 5{x)E{q\x\g h (x)dx. 
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Therefore, assuming that 5(x) is continuous at x we have 


'ip(z) = lim 
h —>o 


dm) 


dt 


= 8{x){q - E[q\x}). 


t=0 


This result could also be derived using the results for conditional expectation estimators in 
Newey (1994). 

The fact that local regularity is necessary and sufficient for equation (13. ,21) highlights the 
strength of the asymptotic linearity condition. Calculating the influence function is a good 
starting point for showing asymptotic linearity but primitive conditions for asymptotic linearity 
can be complicated and strong. For example, it is known that asymptotic linearity can require 
some degree of smoothness in underlying nonparametric functions, see Bickel and Ritov (1988). 
We next discuss regularity conditions for asymptotic linearity. 


4 Sufficient Conditions for Asymptotic Linearity 


One of the important uses of the influence function is to help specify regularity conditions 
for asymptotic linearity. The idea is that once ip(z) has been calculated we know what the 
remainder term for asymptotic linearity must be. The remainder term can then be analyzed 
in order to formulate conditions for it to be small and hence the estimator be asymptotically 
linear. In this section we give one way to specify conditions for the remainder term to be 
small. It is true that this formulation may not lead to the weakest possible conditions for 
asymptotic linearity of a particular estimator. It is only meant to provide a useful way to 
formulate conditions for asymptotic linearity. 

In this section we consider estimators that are functionals of a nonparametric estimator 
taking the form 

p = m, 


where F is some nonparametric estimator of the distribution of z*. Both the integrated squared 
density and the average consumer surplus estimators have this form, as discussed below. We 
consider a more general class of estimators in Section 6. 

Since ft 0 = /3 (Fq), adding and subtracting the term f ifi(z)F(dz) gives 


y/n(P - Po) - ^2 V’fc )/Vn 

i=1 


Ri(F) 


y/nRi{F) + y/^R 2 (F), (4.3) 

J mndz) -jrm/n, mf) = m) - pm - J mm). 


i=l 


If s/nR\(F) and y/nR 2 (F) both converge in probability to zero then ft will be asymptotically 
linear. To the best of our knowledge little is gained in terms of clarity or relaxing conditions 
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by considering Ri(F) + R> 2 {F) rather than Ri(F) and R 2 (F) separately, so we focus on the 
individual remainders. 

The form of the remainders Ri(F) and R 2 (F) are motivated by i/j(z) being a derivative of 
/ 3(F) with respect to F. The derivative interpretation of 'tp(z) suggests a linear approximation 
of the form 

13(F) « P(F 0 ) + J i/>(z)(F - F 0 )(dz ) = P(F 0 ) + J ^(z)F(dz), 

where the equality follows by E\tp(zi)] = 0. Plugging in F in this approximation gives f ip(z)F(dz ) 
as a linear approximation to $ — f3 0 . The term i? 2 (T) is then the remainder from linearizing 
$ = (3(F) around To- The term Ri(F) is the difference between the linear approximation 
f i/j(z)F(dz) evaluated at the nonparametric estimator F and at the empirical distribution F. 
with / ij}(z)F(dz) = YIi=ii>( z i)/ n - 

It is easy to fit the kernel estimator of the integrated squared density into this framework. We 
let F be the CDF corresponding to a kernel density estimator f(z). Then for (3(F) = f f(z) 2 dz, 
the fact that f 2 — f 2 = (f — f) 2 + 2f(f — f) gives an expansion as in equation (14.31) with 





z)f(z)dz -Y^'^( z i)/n 1 R 2 (F) 
1=1 


J I/O) - fo(z)) 2 dz. 


Applying this framework to a series regression estimator requires formulating that as an esti¬ 
mator of a distribution F. One way to do that is to specify a conditional expectation operator 
conditional on x and a marginal distribution for x, since a conditional expectation operator 
implies a conditional distribution. For a series estimator we can take F to have a conditional 
expectation operator such that 


1 . ' 

Ep[a(q,x)\x\ = ~^a(qi,x)p K (x i ) T t~ 1 p K (x). 

n i=l 


Then it will be the case such that 


m = j W(x)E plq \ X ]d X = ) W( x )d( x )dx = ft 

which only depends on the conditional expectation operator, leaving us free to specify any 
marginal distribution for x that is convenient. Taking F to have a marginal distribution which 
is the true distribution of the data we see that 


P(F) - Po = J F p [W(x){q - d 0 (x)}\x\dx = j Ep[^(z)\x]f 0 (x)dx = J ip(z)F(dz ). 
In this case R 2 (F) = 0 and 

/■ i n 

Ri(F) = J Ep[i>(z)\x\fo(x)dx - Zi )• 
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Next we consider conditions for both of the remainder terms Ri(F) and R> 2 (F) to be small 
enough so that $ is asymptotically linear. The remainder term R\(F) = f */>(z)(F — F)(dz ) is 
the difference between a linear functional of the nonparametric estimator F and the same linear 
functional of the empirical distribution F. It will shrink with the sample size due to F and 
F being nonparametric estimators of the distribution of Zi, meaning that they both converge 
to To as the sample size grows. This remainder will be the only one when /3(F) is a linear 
functional of F. 

This remainder often has an important expectation component that is related to the bias of 
/3. Often F can be thought of as a result of some smoothing operation applied to the empirical 
distribution. The F corresponding to a kernel density estimator is of course an example of 
this. An expectation of Ri(F) can then be thought of as a smoothing bias for $, or more 
precisely a smoothing bias in the linear approximation term for (3. Consequently, requiring that 
y/nR\(F) — 0 will include a requirement that y/n times this smoothing bias in j3 goes to zero. 

Also y/n times the deviation of R\(F) from an expectation will need to go zero in order for 
y/nR\(F) —^ 0. Subtracting an expectation from y/nR\(F) will generally result in a stochastic 
equicontinuity remainder, which is bounded in probability for fixed F and converges to zero as 
F approaches the empirical distribution. In the examples the resulting remainder goes to zero 
under quite weak conditions. 

To formulate a high level condition we will consider an expectation conditional on some 
sigma algebra \ n that can depend on all of the observations. This set up gives flexibility in the 
specification of the stochastic equicontinuity condition. 

Assumption 1 : E[Ri(F)\x n ] = Op(n -1 / 2 ) and Ri(F) - E[Ri(F)\x n ] = o p (n ~ 1 / 2 ). 

We illustrate this condition with the examples. For the integrated square density let x n be a 
constant so that the conditional expectation in Assumption 1 is the unconditional expectation. 
Let ijj(z, h) = f ip(z + hu)K(u)du and note that by a change of variables u = (z — Zi)/h we have 
/ i/>(z)f (z)dz = n~ 1 h~ r Y!}= i f i/>(z)K((z - Zi)/h)dz = Y!i=\ V>(, h)/n. Then 

E[Ri(F)] = E[i/j(zi,h)] = J[J i / j(z + hu)f 0 (z)dz\K(u)du, (4.4) 
1 n 

Ri(F) — E[Ri(F)] = 

n z —' 

Here E[Ri(F)] is the kernel bias for the convolution p(t) = f i/>(z + t)fo(z)dz of the influence 
function and the true pdf. It will be o(n“ 1 / 2 ) under smoothness, kernel, and bandwidth con¬ 
ditions that are further discussed below. The term Ri(F) — E[R\(F )] is evidently a stochastic 
equicontinuity term that is o p (n _1 ' /2 ) as long as lirri/,_ E[{i/)(zi,h) — 'ip(zi)} 2 ] = 0. 
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For the series estimator for consumer surplus let S(x) = [J W(x)p K (x)dx] T T, 1 p K (x) and 
note that /3 = 1 S{xi)qi/n. Here we take Xn = {®ij Then we have 


E[Rl(F)\Xn\ 

Ri(F) ~ E[Ri(F)\ Xn \ 


1 X ^ /V 

-V ]S(xi)do(xi) - /3 0 , 

i =1 

1 U - 

~ “ S ( x i)][Qi - do{Xi)]. 

i =1 


(4.5) 


Here E[Ri(F )\ Xn ] is a series bias term that will be o p (n -1 / 2 ) under conditions discussed below. 
The term Ri(F) — E[R\{F )\ Xn ] is a stochastic equicontinuity term that will be o p (n~ 1 / 2 ) as 
5(x) gets close to 5(x). In particular, since S(x) depends only on xi, ...,x n , the expected square 
of this term conditional on Xn will be n~ 2 ^)™ = 1 [<5(xi) — 5(xi)] 2 Var(qi\xi), which is o p (n _1 ) when 
Var(qi\xi ) is bounded and n~~ l — < K a h )] 2 = o p (l). 

Turning now to the other remainder ^(F), we note that this remainder results from lin¬ 
earizing around Fq. The size of this remainder is related to the smoothness properties of /3(F). 
We previously used Gateaux differentiability of /3(F) along certain directions to calculate the in¬ 
fluence function. We need a stronger smoothness condition to make the remainder i? 2 (F) small. 
Frechet differentiability is one helpful condition. If the functional /3(F) is Frechet differentiable 
at Fo then we will have 

F 2 (F) = o(||F — Foil), 


for some norm ||-|| . Unfortunately Frechet differentiability is generally not enough for F 2 (F) = 
o p (n -1 / 2 ). This problem occurs because /3(F) and hence \\F — Fo|| may depend on features of 
F which cannot be estimated at a rate of 1/y r n. For the integrated squared error \\F — Fq|| = 


1 /9 

{/[/(*) - Mz)?dz} is the root integrated squared error. Consequently y/n 
bounded in probability and so -^/nF^F) does not converge in probability to zero. 


F-F n 


is not 


F-Fn 


converges at some rate and 


This problem can be addressed by specifying that 
that /3(F) satisfies a stronger condition than Frechet differentiability. One condition that is 
commonly used is that F 2 (F) = 0(||F — Fo|| 2 ). This condition will be satisfied if /3(F) is 
twice continuously differentiable at Fo or if the first Frechet derivative is Lipschitz. If it is also 
assumed that F converges faster than ra -1 / 4 then Assumption A1 will be satisfied. A more 
general condition that allows for larger F 2 (F) is given in the following hypothesis. 


Assumption 2: For some 1 < ( < 2, F 2 (F) 


0(\\F- F 0 || C ) and 


F-F 0 


= o p (n 1/2C ). 


This condition separates nicely into two parts, one about the properties of the functional and 
another about a convergence rate for F. For the case (/ = 2 Assumption 2 has been previously 
been used to prove asymptotic linearity, e.g. by Ait-Sahalia (1991), Andrews (1994), Newey 
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(1994), Newey and McFadden (1994), Chen and Shen (1998), Chen, Linton, and Keilegom 
(2003), and Ichimura and Lee (2010) among others. 

In the example of the integrated squared density R. 2 (F ) = / [f(z)~ fo(z)] 2 dz = 0(\\F — Loll 2 ) 
for ||F — Fo|| = {f[f(z) — /o(^)] 2 d^} 1 / 2 . Thus Assumption 2 will be satisfied with £ = 2 when 
/ converges to /o faster than n -1 / 4 in the integrated squared error norm. 

The following result formalizes the observation that Assumptions 1 and 2 are sufficient for 
asymptotic linearity of fi. 


Theorem 2: If Assumptions 1 and 2 are satisfied then f3 is asymptotically linear with 
influence function if(z). 

An alternative set of conditions for asymptotic normality of \/n(/3 — /3 0 ) was given by Ait- 
Sahalia (1991). Instead of using Assumption 1 Ait-Sahalia used the condition that y/n(F — Fo) 
converged weakly as a stochastic process to the same limit as the empirical process. Asymp¬ 
totic normality of \fin f fi(z)F(dz) then follows immediately by the functional delta method. 
This approach is a more direct way to obtain asymptotic normality of the linear term in the 
expansion. However weak convergence of y/n(F — Fo) requires stronger conditions on the non- 
parametric bias than does the approach adopted here. Also, Ait-Sahalia’s (1991) approach does 
not deliver asymptotic linearity, though it does give asymptotic normality. 

These conditions for asymptotic linearity of semiparametric estimators are more complicated 
than the functional delta method outlined in Reeds (1976), Gill (1989), and Van der Vaart 
and Wellner (1996). The functional delta method gives asymptotic normality of a functional 
of the empirical distribution or other root-n consistent distribution estimator under just two 
conditions, Hadamard differentiability of the functional and weak convergence of the empirical 
process. That approach is based on a nice separation of conditions into smoothness conditions 
on the functional and statistical conditions on the estimated distribution. It does not appear 
to be possible to have such simple conditions for semiparametric estimators. One reason is 
that they are only differentiable in norms where \fn F — Fo is not bounded in probability. In 
addition the smoothing inherent in F introduces a bias that depends on the functional and so 
the weakest conditions are only attainable by accounting for interactions between the functional 
and the form of F. In the next Section we discuss this bias issue. 


5 Linear Functionals 

In this Section we consider primitive conditions for Assumption 1 to be satisfied for kernel 
density and series estimators. We focus on Assumption 1 because it is substantially more 
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complicated than Assumption 2. Assumption 2 will generally be satisfied when /3(F) is suffi¬ 
ciently smooth and F converges at a fast enough rate in a norm. Such conditions are quite well 
understood. Assumption 1 is more complicated because it involves both bias and stochastic 
equicontinuity terms. The behavior of these terms seems to be less well understood than the 
behavior of the nonlinear terms. 

Assumption 1 being satisfied is equivalent to the linear functional /3(F) = f i/>(z)F(dz ) being 
an asymptotically linear estimator. Thus conditions for linear functionals to be asymptotically 
linear are also conditions for Assumption 1. For that reason it suffices to confine attention to 
linear functionals in this Section. Also, for any linear functional of the form /3(F) = f ((z)F(dz) 
we can renormalize so that /3(F) — /3 0 = f ip(z)F(dz) for i/>(z) = C(z) ~ E[(( z i)\. Then without 
loss of generality we can restrict attention to functionals /3(F) = f ip(z)F(dz) with E[iJj(zi)\ = 0. 

5.1 Kernel Density Estimators 

Conditions for a linear functional of a kernel density estimator to be asymptotically linear 
were stated though (apparently) not proven in Bickel and Ritov (2003). Here we give a brief 
exposition of those conditions and a result. Let z be an r x 1 vector and F have pdf f(z) = 
n~ 1 h~ r Yhi K(( z ~ Zi)/h). As previously noted, for ip(z,h) = f ij)(z + hu)K(u)du we have 
$ = n~ 1 Y^i=\ ip(zi, h). To make sure that the stochastic equicontinuity condition holds we 
assume: 

Assumption 3: K(u) is bounded with bounded support, f K(u)du = 1, ip(z) is continuous 
almost everywhere, and for some £ > 0, E[supi t i <e + t) 2 ] < oo. 

From Bickel and Ritov (2003, pp. 1035-1037) we know that the kernel bias for linear 
functionals is that of a convolution. From equation (14.41) we see that 

E\J3] ~ A) = J p(hu)K(u)du,p(t) = J ip(z + t)f 0 (z)dz = J ip(z)f 0 (z - t)dz. 

Since p(0) = 0 the bias in f3 is the kernel bias for the convolution p(t). A convolution is smoother 
than the individual functions involved. Under quite general conditions the number of derivatives 
of p(t) that exist will equal the sum of the number of derivatives sj of fo(z) that exist and 
the number of derivatives of i/j(z) that exist. The idea is that we can differentiate the first 
expression for p(t) with respect to t up to times, do a change of variables z = z + t, and 
then differentiate sf more times with respect to t to see that p(t) is s^p + Sf times differentiable. 
Consequently, the kernel smoothing bias for j3 behaves like the kernel bias for a function that 
is sp, + sj times differentiable. If a kernel of order Sf + is used the bias of $ will be of 
order h s ^ +s f that is smaller than the bias order h s f for the density. Intuitively, the integration 


[ 13 ] 


inherent in a linear function is a smoothing operation and so leads to bias that is smaller order 
than in estimation of the density. 

Some papers have used asymptotics for kernel based semiparametric estimators based on 
the supposition that the bias of the semiparametric estimator is the same order as the bias of 
the nonparametric estimator. Instead the order of the bias of $ is the product of the order 
of kernel bias for fo(z) and if(z) when the kernel is high enough order. This observations is 
made in Bickel and Ritov (2003). Newey, Hsieh, and Robins (2004) also showed this result for 
a twicing kernel, but a twicing kernel is not needed, just any kernel of appropriate order. 

As discussed in Bickel and Ritov (2003) a bandwidth that is optimal for estimation of /o 
may also give asymptotic linearity. To see this note that the optimal bandwidth for estimation 
of /o is n~ 1 ^ r+2s f\ Plugging in this bandwidth to a bias order of h s ^ +s f gives a bias in /3 
that goes to zero like n~^ s 'i j+s ^^ r+2s sf This bias will be smaller than n -1 / 2 for > r/2. 
Thus, root-n consistency of (3 is possible with optimal bandwidth for / when the number of 
derivatives of f>(z) is more than half the dimension of 2 . Such a bandwidth will require use of 
as^ + sj order kernel, which is higher order than is needed for optimal estimation of /o- Bickel 
and Ritov (2003) refer to nonparametric estimators that both converge at optimal rates and 
for which linear functionals are root-n consistent as plug in estimators, and stated > r/2 as 
a condition for existence of a kernel based plug in estimator. 

We now give a precise smoothness condition appropriate for kernel estimators. Let A = 
(Ai,..., X r ) T denote a vector of nonnegative integers and |A| = ^/=i Ab Let d x f(z ) = f [z) / dz Xl ■ 
■ ■ dz Xr denote the X th partial derivative of f(z ) with respect to the components of z. 


Assumption 4: fo(z) is continuously differentiable of order Sf, f>(z) is continuously dif¬ 
ferentiable of order s^, K{u ) is a kernel of order Sf + s^, y/nh s f +s 'l , —> 0, and there is e > 0 
such that for all X, X', X" with |A| < s^, |A ; | = s^, and |A ; | < Sf 


[ sup d x ij)(z + t) f 0 (z)dz<oo, [ d x 'tp(z) sup d x "f(z + t ) 

J |t|<e J l£l<e 


sup 

id<e 


dz < oo 


Here is a result on asymptotic linearity of kernel estimators of linear functionals. 

Theorem 3: If Assumptions 3 and 4 are satisfied then f if(z)F(dz) = X^=i ^{ z i)/ n + 
o p (n -1 / 2 ). 


There are many previous results on asymptotic linearity of linear functionals of kernel density 
estimators. Newey and McFadden (1994) survey some of these. Theorem 3 differs from many of 
these previous results in Assumption 4 and the way the convolution form of the bias is handled. 
We follow Bickel and Ritov (2003) in this. 
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5.2 Series Regression Estimators 

Conditions for a linear functional of series regression estimator to be asymptotically linear were 
given in Newey (1994). It was shown there that the bias of a linear functional of a series 
estimator is of smaller order than the bias of the series estimator. Here we provide an update 
to those previous conditions using Belloni, Chernozhukov, Chetverikov, and Kato (2015) on 
asymptotic properties of series estimators. We give conditions for asymptotic linearity of a 
linear functional of a series regression estimator of the form 

We give primitive conditions for the stochastic equicontinuity and bias terms from equation 
(14.511 to be small. 

Let5(x) = [j W{x)p K (x)dx\ T Y l p K (x) = E[S(x)p K (x) T ]E~ 1 p K (x) and<5(x) = fo(x)~ 1 W(x) 
as described earlier. The stochastic equicontinuity term will be small if YY=\ [£( 27 )—<5(a7)] 2 / n -Y 
0. Let £ = E\p K {xi)p K [xi) T ] and 7 = T,- 1 E\p K (xi)do(xi)\ be the coefficients of the population 
regression of do(xi ) on p K (xi). Then the bias term from equation (14.51) satisfies 

n n 

- Y] 5(xi)d 0 (xi) = r T t~ 1 'Yp K (x i )[d 0 (x i ) — p K (xi) Tr )\/n + E[5(xi){p K (x i ) T 'y - d 0 (xi)}}, 

n z —' z —' 

i =1 i=l 

(5.6) 

The first term following the equality is a stochastic bias term that will be o p (n _1//2 ) under rela¬ 
tively mild conditions from Belloni et. al. (2015). For the coefficients 7 $ = T,~ 1 E\p K (xi)5(xi)\ 
of the population projection of 5(xi) on p K ( 27 ) the second term satisfies 

E[S{xi){p K (xi ) T 7 - d 0 (27)}] = -E[{S(xi) - 'yJp K (xi)}{d 0 (xi) - p K (x i ) T 'y}] 

where the equality holds by do{x{) — p K (xi) Tr y being orthogonal to p K (xi) in the population. 
As pointed out in Newey (1994), the size of this bias term is determined by the product of series 
approximation errors to 5{xi) and to do(xi). Thus, the bias of a series semiparametric estimator 
will generally be smaller than the nonparametric bias for a series estimate of do(x). For example, 
for power series if do(x) and 5(x) are continuously differentiable of order Sd and s$ respectively, 
x is r-dimensional, and the support of x is compact then by standard approximation theory , 

| E[{5(x) - 'yJp K (x)}{d 0 (x ) - p K {x) T 1 }} | < CK-(‘* + "V r 

As discussed in Newey (1994) it may be possible to use a K that is optimal for estimation 
of do and also results in asymptotic linearity. If s$ > r/2 and K is chosen to be optimal for 
estimation of do then y/nK-( Sd+Ss ^ r —> 0. Thus, root-n consistency of ft is possible with 
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optimal number of terms for when the number of derivatives of 6(x) is more than half the 
dimension of 

Turning now to the regularity conditions for asymptotic linearity, we follow Belloni et. al. 
(2015) and impose the following assumption that takes care of the stochastic equicontinuity 
condition and the random bias term.: 

Assumption 5: var(qi\xi) is bounded , E[5(xi) 2 ] < 00 , the eigenvalues of £ = E\p K (xi)p K (xi) T ] 
are bounded and bounded away from zero uniformly in K, there is a set x with Pr(xj € x) = 1 
and ck and £r such that\JE[{do{xi) — p K (xi) T y} 2 ] < ck , sup xex \do(x) — p K (x) T y| < £rcr, 
and for f K = sup x€x ||p K (x)|| , we have K/n + (, 2 r (l n A) /nil + \/KIrcr ) + £rcr — > 0. 

The next condition takes care of the nonrandom bias term. 

Assumption 6: y 7 E[{5{xi ) -p K (xi) T Xs} 2 ] < c s K , c 5 K —)• 0, and uc 5 k cr —s- 0. 

Belloni et. al. (2015) give an extensive discussion of the size of cr , £r. and for various 
kinds of series approximations and distributions for ay For power series Assumptions 5 and 6 
are satisfied with cr = CK ~ Sd ^ r , c s K = CK~ Ss ^ r , £r = K , f K = K , and 

V 7 K 2 (ln K) /n( 1 + K 3/2 K~ Sd/r ) + K 1 ~t Sd / r ) —*. 0 , y/EK~^ +a ^ r —> 0. 

For tensor product splines of order o, Assumptions 5 and 6 are satisfied with cr = CK~ mm { s, r°}/ r ^ 
c 5 r = CK~ mi n{ fl *>°}/ r , £ k = C,f K = VK, and 

y/K (ln K) /n{ 1 + ^KK~ min ^°^ r ) 0 ^ K -(^{s d ,o}+min{s s ,o})/r _ Q 

Theorem 4: If Assumptions 5 and 6 are satisfied then for if(z) = 5(x)[q — do(x)] we have 
J W(x)d(x) = Ya =1 + o p (n -1/2 ). 

Turning now to the consumer surplus bound example, note that in this case W{x) is not 
even continuous so that 5(x) is not continuous. This generally means that one cannot assume 
a rate at which c s K goes to zero. As long as p K (x) can provide arbitrarily good mean-square 
approximation to any square integrable function, then c s K —> 0 as A grows. Then Assumption 
6 will require that y/ncR is bounded. Therefore for power series it suffices for asymptotic 
linearity of the series estimator of the bound that 

^K 2 (ln K) /n{ 1 + K 3 l 2 K~ Sd l 2 ) + A 1 ”^/ 2 ) —> 0, y/HK~ Sd/2 < C. 

For this condition to hold it suffices that do(x) is three times differentiable, K 2 ln(A)/n —> 0, 
and K 3 /n is bounded away from zero. For regression splines it suffices that 
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y/KQnK) /n(l + VKK - min{ ^.°} /2 ) 0, ^/niL - min{sd ’ o}/2 < C. 

For this condition to hold it suffices that the splines are of order at least 2, do(x) is twice 
differentiable, K\n(K)/n —> 0 and K 2 /n is bounded away from zero. Here we find weaker 
sufficient conditions for a spline based estimator to be asymptotically linear than for a power 
series estimator. 

6 Semiparametric GMM Estimators 

A more general class of semiparametric estimators that has many applications is the class of 
generalized method of moment (GMM) estimators that depend on nonparametric estimators. 
Let m(z, j3, F) denote a vector of functions of the data observation z, parameters of interest f3, 
and a distribution F. A GMM estimator can be based on a moment condition where j3 0 is the 
unique parameter vector satisfying 


E[m(zi,P,Fo)] = 0. 


That is we assume that this moment condition identifies (3. 

Semiparametric single index estimation provides examples. For the conditional mean re¬ 
striction, the model assumes the conditional mean function to only depend on the index, so 
that E(y\x) = <f>(x T 9o). With normalization imposed, first regressor coefficient is 1 so that 
#o = {l,Po) T - Let 9 = (l,j3 T ) T . Ichimura (1993) showed that under some regularity condi¬ 
tions, 

nun E{[y — E(y\x T 9)] 2 } 
identifies /3 0 . Thus in this case, z = (x,y) and 

mMF)= 2ik^Fm. 

For the conditional median restriction, the model assumes the conditional median function 
M(y\x) to only depend on the index, so that M(y\x) = cj)(x T 9o). Ichimura and Lee (2010) 
showed that under some regularity conditions, 


minE{\y — M (y\x T 9)\} 


identifies (3 0 . Thus in this case, 


m(z,P,F) 


d{\y - M F (y\x T 9)\} 

dp 
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Let x = (x\,x T ) T . Note that at 0 = 0 O , the derivative of E(y\x T 9) with respect to (3 equals 

(j)'(x T 6o)[x — E{x\x t Oq)]. 

Thus the target parameter 0 O satisfies the first order condition 

0 = E{cj)'{x T 6 0 )[x - E(x\x T 9 0 )][y - E(y\x T 9 0 )}}. 

Analogously, at 0 = 0 O , the derivative of M(y\x T 9) with respect to (3 equals 

<j>'{x T 0 Q )[x - E(x\x T f3)\/f y \ x (M(y\x T Oq)\x). 

Thus the target parameter 0 Q satisfies the first order condition 

0 = E{(j) , (x r 9 0 )[x - E(x\x t 9 0 )][2 ■ 1 {y < M(y\x T 9 0 )} - 1}/ f y \ x (M(y\x T 9 0 )\x)}. 

Estimators of 0 O can often be viewed as choosing 0 to minimize a quadratic form in sample 
moments evaluated at some estimator F of Fq. For rh((3) = Y^i=\ F)/ n an d VE a 

positive semi-dehnite weighting matrix the GMM estimator is given by 

(3 = argmin m(/3) T Wm(/3). 

In this Section we discuss conditions for asymptotic linearity of this estimator. 

For this type of nonlinear estimator showing consistency generally precedes showing asymp¬ 
totic linearity. Conditions for consistency are well understood. For differentiable m(/3) asymp¬ 
totic linearity of 0 will follow from an expansion of rh(0) around 0 O in the first order conditions. 
This gives 

y/n(0 - 0 Q ) = -( M T WM)~ 1 M T WVnm(P 0 ), 

with probability approaching one, where M = dm(0)/d0, M = drh(0)/d0, and 0 is a mean 
value that actually differs from row to row of M. Assuming that W —^ W for positive semi- 
definite W, and that M —^ M = E[dm(zi, 0 O , Fq)/ d0\ and M —^ M, it will follow that 
(. M t WM)~ 1 M t W ( M t WM)~ 1 M t W Then asymptotic linearity of 0 will follow from 

asymptotic linearity of m(0 o ). 

With an additional stochastic equicontinuity condition like that of Andrews (1994), asymp¬ 
totic linearity of rh(0o) will follow from asymptotic linearity of functionals of F. For F G F let 
H(F) = E[m(zi,0 o ,F)\ and 

1 n 

R3( f ) = ~ ^{m{zi,0 o ,F) - m(zi,0 o ,F o ) - y(F)} 

i —1 


[ 18 ] 


Note that y/nRflF) is the difference of two objects that are bounded in probability (by 
E[m(zi, fl 0 , Fo)\ = 0) and differ only when F is different than Fq. Assuming that m(zi, (3 0 , F) 
is continuous in F in an appropriate sense we would expect that ^/nR-flF) should be close to 
zero when F is close to Fq. As long as F is close to Fq in large samples in that sense, i.e. is 
consistent in the right way, then we expect that the following condition holds. 

Assumption 7: y/nRflF) 0. 

This condition will generally be satisfied when the nonparametrically estimated functions 
are sufficiently smooth with enough derivatives that are uniformly bounded and the space of 
function in which F lie is not too complex; see Andrews (1994) and Van der Vaart and Wellner 
(1996). Under Assumption 7 asymptotic linearity of /r(F) will suffice for asymptotic linearity 
of yjnrh(/ 3 0 ). To see this suppose that //(F) is asymptotically linear with influence function 
<p(z). Then under Assumption 7 and by //(Fq) = E[m(zi, /3 0 , Fq)] = 0, 

j n i n 

Vnm(Po) = -pE F o ) + V™h( F ) + o p ( 1) = —= Y][m(zi, /3 0 , F 0 ) + p(zi)\ + o p ( 1). 

1=1 1=1 

Thus Assumption 7 and asymptotic linearity of //(F) suffice for asymptotic linearity of rh(/3 0 ) 
with influence function m(z, (3 0 , Fq) + ip(z). In turn these conditions and others will imply that 
f3 is asymptotically linear with influence function 

if(z) = —(M T WM)~ 1 M T W[m(z, /3 0 , Fo) + y{z)\. 

The influence function tp(z) of //(F) = E[m(zi, j3 0 ,F)] can be viewed as a correction term 
for estimation of Fo. It can be calculated from equation (13.21) applied to the functional //(F). 
Assumptions 1 and 2 can be applied with /3(F) = //(F) for regularity conditions for asymptotic 
linearity of //(F). Here is a result doing so 

Theorem 5: If /3 -4- /3q, W -4- W, rh(/3) is continuously differentiable in a neighborhood 
of /3 0 with probability approaching 1, for any /? -4- /3 0 we have drh(j3)/d/3 M, M T WM 
is nonsingular, Assumptions 1 and 2 are satisfied for /3(F) = E[m(zi, j3 0 , F)\ and if(z) = 
<p(z), and Assumption 7 is satisfied then f3 is asymptotically linear with influence function 
-( M T WM)~ 1 M T W[m(z , /3 0 , F 0 ) + <p(z)]. 

Alternatively, Assumption 7 can be used to show that the GMM estimator is asymptotically 
equivalent to the estimator studied in Section 4. 

For brevity we do not give a full set of primitive regularity conditions for the general GMM 
setting. They can be formulated using the results above for linear functionals as well as Frechet 
differentiability, convergence rates, and primitive conditions for Assumption 7. 
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7 Conclusion 


In this paper we have given a method for calculating the influence function of a semiparamet- 
ric estimator. We have also considered ways to use that calculation to formulate regularity 
conditions for asymptotic linearity. We intend to take up elsewhere the use of the influence 
function in bias corrected semiparametric estimation. Shen (1995) considered optimal robust 
estimation among some types of semiparametric estimators. Further work on robustness of the 
kinds of estimators considered here may be possible. Other work on the influence function of 
semiparametric estimators may also be of interest. 


8 Appendix A: Proofs 


Proof of Theorem 1: Note that in a neighborhood of f = 0, [(1 — t)fo(z) + tg z ( 2 )] 
continuously differentiable and we have 


1/2 


IS 


St(z) = ^ (l-t)f Q (z)+tg*(z) 


1/2 i 


9 h z{z) - f 0 (z) 


[tg%(z) + (1 - t)fo(z)} 


1/2 


< c 


9 h z (z) + fo(z) 
fo(z ) 1 / 2 ' 


By fo(z) bounded away from zero on a neighborhood of 2 and the support of g^(z) shrinking 
to zero as h —> 0 it follows that there is a bounded set B with g z (z)/fo(z) 1 ^ 2 < Cl(z € B) for 
h small enough. Therefore, it follows that 


/ 


9z(z) + fo(z) 
fo(z ) 1/2 


d/j, < C 


1(5 € B)dz + 1 < 00 . 


1 / c\ 

Then by the dominated convergence theorem [(1 — t)fo(z) + tg z (z)] ” is mean-square differ¬ 
entiable and /(f) = f St(z) 2 dz is continuous in t on a neighborhood of zero for all h small 
enough. Also, by g~(z) —> 0 for all 2 ^ 2 and fo(z) > 0 on a neighborhood of it follows 
that gz(z) i=- fo{z) for all t and h small enough and hence /(f) > 0. Then by Theorem 7.2 
and Example 6.5 of Van der Vaart (1998) it follows that for any t n = 0(1/y/n) a vector of n 
observations ( 21 , ..., 2 n ) that is i.i.d. with pdf ft n (z) = (1 — t n )fo(z) + t n g^(z) is contiguous to 
a vector of n observations with pdf fo(z). Therefore, 


~ Po) 



n 

^Z'P’izi) + o P (l) 
1=1 


holds when ( 21 , ...,2 n ) are i.i.d. with pdf ft n (z). 

Next by ip(z) continuous at 2 , ij)(z) is bounded on a neighborhood of 2 . Therefore for small 
enough h, f \\ip(z )\\ 2 g%(z)dz < 00 , and hence J Wip^W 2 f t (z)dz = (1 - f) / \H’(z)\\ 2 f t (z)dz + 
t f ||-i/ , ( 2)|| 2 g^(z)dz is continuous in t in a neighborhood of f = 0. Also, for /jP z = J tp(z)g z (z)dz 
note that J ip(z)ft(z)dz = tfi z . 
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Suppose (zi, ..., z n ) are i.i.d. with pdf ft n (z). Let /3(t) = /3((1 — t)Fo + tG z ) and (3 n = /3(t n ). 
Adding and subtracting terms, 

1 n 

Vn(j3-Pn) = - Po) - Vn(P n ~ Po) = + o p (l) - Vn(f3 n - /3 0 ) 

^ i =1 

1 n w 

= —j= Y] Tpn( z i) + o P (l) + \fnt n \i h z - \fn(fi n - /3 0 ),ijj n (zi) = ijj(zi) - t n \x h z . 

* /n z —' 


i=i 


Note that f if> n (z)ft n (z)dz = 0. Also, for large enough n, 

Jim f 1( $ n (z) >M) ip n {z) f tn (z)dz< lim C [ l(U(z)\\ > M/2){\\i/j(z)\\ 2 +C)f 0 (z)dz 

M —>-oo J M—yoo J 

so the Lindbergh-Feller condition for a central limit theorem is satisfied. Furthermore, it follows 
by similar calculations that f Tp n (z)il’ n (z) T ft n (z)dz —> V. Therefore, by the Lindbergh-Feller 
central limit theorem, Ya=i ^n( z i) ~ A1(0, V). Therefore we have y/n((3 — /3 n ) N(0, V) if 

and only if 

Vnt n n h z - Vn(/3 n - /?o) —> 0 . (8.7) 

Suppose that f3(t) is differentiable at t = 0 with derivative Jh Then 


y/n(/3 n - P 0 ) - \fnt n [J h z = \fno(t n ) = s/nt n o(l) —> 0 


by y/nt n bounded. Next, we follow the proof of Theorem 2.1 of Van der Vaart (1991), and 
suppose that eq. (18.71) holds for all t n = 0(l/y/n). Consider any sequence r m —> 0. Let n m 
be the subsequence such that 

(1 + n m )~ l/2 <r m < n“ 1/2 . 

Let t n = r m for n = n m and t n = n " 1 / 2 for n ^ {ni,ri 2 , By construction, t n = 0(l/y/n), 
so that eq (18.71) holds. Therefore it also holds along the subsequence n m , so that 

^/n~r m j/4 - -— j = yjn^,r m ii h z - y/n^,[/3(r m ) - /3 0 \ —*• 0. 

I r m J 

By construction yjn m r m is bounded away from zero, so that fi z — \(3(r m ) —/3o]/ r m —> 0. 
Since r m is any sequence converging to zero it follows that /3(t) is differentiable at 1 = 0 with 
derivative fi z . 

We have now shown that eq. (18.71) holds for all sequences t n = 0(1/y/n) if and only if (3(t) 
is differentiable at t = 0 with derivative n z . Furthermore, as shown above eq. (18.71) holds if and 
only if (3 is regular. Thus we have shown that $ is regular if and only if j3(t ) is differentiable at 
t = 0 with derivative ji z . 
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Finally note that as h —> 0 it follows from continuity of 'ip(z) at z, K[u ) bounded with 
bounded support, and the dominated convergence theorem that 




J ip(z)g^(z)dz = h r j il>(z)K((z — z)/h)dz = J i\)[z + hu)K(u)du.Q.E.D. 


Proof of Theorem 2: This follows as outlined in the text from Assumptions 1 and 2 and 
eq. (14.31) and the fact that if several random variables converge in probability to zero then so 
does their sum. Q.E.D. 

Proof of Theorem 3: By the first dominance condition of Assumption 4, f i/j(z-\-t)f(z)dz 
is continuously differentiable with respect t up to order s £ in a neighborhood of zero and for all 
A with |A| < 

d X / i/)(z + t)f 0 (z)dz = J d x ip(z + t)f 0 (z)dz. 

For any A with |A| = it follows by a change of variables z = z + t and the second dominance 
condition that 

J d x i/j(z + t)f 0 (z)dz = j d x ^{z)f 0 (z - t)dz 

is continuously differentiable in tup to order sj in a neighborhood of zero and that for any A ; 
with | A 7 1 < Sf 

d x ' J d x il>(z)f 0 (z - t)dz = J d x ^{~z)d x 'u~z - t)dz. 

Therefore p(t) = f i/)(z + t)fo(z)dz is continuously differentiable of order s^ + Sf in a neighbor¬ 
hood of zero. Since p(0) = 0 and K(u) has bounded support and is order sq + Sf the usual 
expansion for kernel bias gives 

E\p\ ~Po = J p{hu)K(u)du = 0(h s i +s f). 

Therefore, E[^/nRi(F)] —> 0. 

Next, by continuity almost everywhere of z ) in Assumption 3 it follows that ip(zi+hu) —> 
V>( Zi) as h —> 0 with probability one (w.p.l). Also, by Assumption 3 supi t i <e \^{zi + t)\ is finite 
w.p.l, so that by K(u) having bounded support and the dominated convergence theorem, w.p.l, 

ip( Zi,h ) = J ip(zi + hu)K(u)du — » ip(zi). 

Furthermore, for h small enough 

i/)(Zi , h) 2 < C sup i/}(zi + f) 2 , 

|t|<£ 
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0 . 


so it follows by the dominated convergence theorem that E[{ip(zi, h) — 'ip(zi)} 2 ] —> 0 as h - 
Therefore, 

n 

Var(y/nRi(F)) = Var(n~ 1/2 , h) - i>(zi)}) < E [{&(zi, h) - tp(zi)} 2 ] —0. 

2=1 

Since the expectation and variance of y/nRi(F) converges to zero it follows that Assumption 
1 is satisfied. Assumption 2 is satisfied because /3(F) is a linear functional, so the conclusion 
follows by Theorem 2. Q.E.D. 


Proof of Theorem 4: Since everything in the remainders is invariant to nonsingu¬ 
lar linear transformations of p K ( x ) it can be assumed without loss of generality that £ = 
E\p K (xi)p K (xi) 1 } = I. Let 6(xi) = V T p K (xi ) = 7 ' 5 p h (xi) so that by Assumption 6 , E[{6(xi) — 
5(xi)} 2 ] —> 0. Note that by Var(qi\xi) bounded and the Markov inequality, 

n n 

^2{5(xi) - d(xi)} 2 Var(qi\xi)/n < C^{5(xj) - 5(xj)} 2 /n 
2=1 2 — 1 

n n 

< C - 6( Xi )} 2 /n + C ^{T^XT 1 - F)p K (x 1 )\ 2 /n 

2=1 2=1 

< Op( 1) + r T (£ _1 - I)t(t~ l - I)T = o p ( 1), 

where the last equality follows as in Step 1 of the proof of Lemma 4.1 of Belloni et. al. (2015). 
We also have 

r T r = E[6(x)p K (x i ) T ]E~ 1 E[6(x)p K (x i )} = E[{'yJp K (x i )} 2 }. 

By ck —> 0 it follows that E[{'yJp K (xi)} 2 } —> E[5(xi) 2 } > 0 , so that T / 0. Let f = 
r/(T r T) 1 / 2 , so that f T f = l. Note that 

n n 

T T t~ 1 ^2p K (xi)[d 0 (x i ) - p K (xi) T j}/n = f T (7 - 7),7 = t~ 1 ^2p K (x i )d 0 (x i )/n 
2=1 2=1 

Let Ri n (T) and be defined by the equations 

n 

V^f T (7 - 7 ) = r T Yi p K (x i )[d 0 (x i ) - p K (x i ) T 1 ]/V^ + Rm(f) = R\n (T) + i? 2 n(f )• 

2=1 

By eqs. (4.12) and (4.14) of Lemma 4.1 of Belloni et. al. (2015) and by Assumption 5 we have 
Rm( f) = O p (^e K (Inif) /n( 1 + VkI k c k )) -4 0 , R 2n ( f) = O v (l K c K ) -A 0 . 

Noting that T t T < E[5(xi) 2 } = 0(1), we have 

n 

r r £ _1 ^2p K (xi)[d 0 (xi) - p K (x i ) T 'j\/n = (T t T) 1 / 2 T j (7 - 7 ) = 0(l)o p (l) 0. 

2=1 
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Also, note that E[p K (xi){do(xi) — p K (xi) T 7 }] = 0, so that by the Cauchy-Schwarz inequality, 

y/n\E[5(xi){do(xi) -p K (xi) T 7 }]] = s/n\E[{5(xi) - p K (x i ) T 'y 5 }{do(x i ) - p K (xi) T 7 }]] < y/nc 5 K c K 

Then the conclusion follows by the triangle inequality and eq. (15.61) . Q.E.D. 

Proof of Theorem 5: As discussed in the text it suffices to prove that m(/3 0 ) is asymp¬ 
totically linear with influence function m(z, /3 0 , Fq) + a(z). By Assumption 7 it follows that 

1 n 

MP 0 ) = 0 ,Fo) + p(F) + o p (n~ 1/2 ). 

Tl 

i=l 

Also, by the conclusion of Theorem 1 and /i(To) = 0 we have 

1 n 

»(F) = ~ ¥>(*) + °p( n_1/2 )- 

i= 1 

By the triangle inequality it follows that 

1 n 

MPo) = ~ X j3 0 , F 0 ) + y(zi)\ + o p {n^ i/2 ).Q.E.D. 

n 

z=i 
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